24 research outputs found

    On explaining machine learning models by evolving crucial and compact features

    Get PDF
    Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models

    On the robustness of sparse counterfactual explanations to adverse perturbations

    Get PDF
    Counterfactual explanations (CEs) are a powerful means for understanding how decisions made by algorithms can be changed. Researchers have proposed a number of desiderata that CEs should meet to be practically useful, such as requiring minimal effort to enact, or complying with causal models. In this paper, we consider the interplay between the desiderata of robustness (i.e., that enacting CEs remains feasible and cost-effective even if adverse events take place) and sparsity (i.e., that CEs require only a subset of the features to be changed). In particular, we study the effect of addressing robustness separately for the features that are recommended to be changed and those that are not. We provide definitions of robustness for sparse CEs that are workable in that they can be incorporated as penalty terms in the loss functions that are used for discovering CEs. To carry out our experiments, we create and release code where five data sets (commonly used in the field of fair and explainable machine learning) have been enriched with feature-specific annotations that can be used to sample meaningful perturbations. Our experiments show that CEs are often not robust and, if adverse perturbations take place (even if not worst-case), the intervention they prescribe may require a much larger cost than anticipated, or even become impossible. However, accounting for robustness in the search process, which can be done rather easily, allows discovering robust CEs systematically. Robust CEs make additional intervention to contrast perturbations much less costly than non-robust CEs. We also find that robustness is easier to achieve for the features to change, posing an important point of consideration for the choice of what counterfactual explanation is best for the user. Our code is available at: https://github.com/marcovirgolin/robust-counterfactuals

    Symbolic regression is NP-hard

    Get PDF
    Symbolic regression (SR) is the task of learning a model of data in the form of a mathematical expression. By their nature, SR models have the potential to be accurate and human-interpretable at the same time. Unfortunately, finding such models, i.e., performing SR, appears to be a computationally intensive task. Historically, SR has been tackled with heuristics such as greedy or genetic algorithms and, while some works have hinted at the possible hardness of SR, no proof has yet been given that SR is, in fact, NP-hard. This begs the question: Is there an exact polynomial-time algorithm to compute SR models? We provide evidence suggesting that the answer is probably negative by showing that SR is NP-hard

    An interpretable method for automated classification of spoken transcripts and written text

    Get PDF
    We investigate the differences between spoken language (in the form of radio show transcripts) and written language (Wikipedia articles) in the context of text classification. We present a novel, interpretable method for text classification, involving a linear classifier using a large set of n- gram features, and apply it to a newly generated data set with sentences originating either from spoken transcripts or written text. Our classifier reaches an accuracy less than 0.02 below that of a commonly used classifier (DistilBERT) based on deep neural networks (DNNs). Moreover, our classifier has an integrated measure of confidence, for assessing the reliability of a given classification. An online tool is provided for demonstrating our classifier, particularly its interpretable nature, which is a crucial feature in classification tasks involving high-stakes decision-making. We also study the capability of DistilBERT to carry out fill-in-the-blank tasks in either spoken or written text, and find it to perform similarly in both cases. Our main conclusion is that, with careful improvements, the performance gap between classical methods and DNN-based methods may be reduced significantly, such that the choice of classification method comes down to the need (if any) for interpretability

    Unveiling evolutionary algorithm representation with DU maps

    Get PDF
    Evolutionary algorithms (EAs) have proven to be effective in tackling problems in many different domains. However, users are often required to spend a significant amount of effort in fine-tuning the EA parameters in order to make the algorithm work. In principle, visualization tools may be of great help in this laborious task, but current visualization tools are either EA-specific, and hence hardly available to all users, or too general to convey detailed information. In this work, we study the Diversity and Usage map (DU map), a compact visualization for analyzing a key component of every EA, the representation of solutions. In a single heat map, the DU map visualizes for entire runs how diverse the genotype is across the population and to which degree each gene in the genotype contributes to the solution. We demonstrate the generality of the DU map concept by applying it to six EAs that use different representations (bit and integer strings, trees, ensembles of trees, and neural networks). We present the results of an online user study about the usability of the DU map which confirm the suitability of the proposed tool and provide important insights on our design choices. By providing a visualization tool that can be easily tailored by specifying the diversity (D) and usage (U) functions, the DU map aims at being a powerful analysis tool for EAs practitioners, making EAs more transparent and hence lowering the barrier for their use

    On genetic programming representations and fitness functions for interpretable dimensionality reduction

    Get PDF
    Dimensionality reduction (DR) is an important technique for data exploration and knowledge discovery. However, most of the main DR methods are either linear (e.g., PCA), do not provide an explicit mapping between the original data and its lower-dimensional representation (e.g., MDS, t-SNE, isomap), or produce mappings that cannot be easily interpreted (e.g., kernel PCA, neural-based autoencoder). Recently genetic programming (GP) has been used to evolve interpretable DR mappings in the form of symbolic expressions. There exists a number of ways in which GP can be used to this end and no study exists that performs a comparison. In this paper, we fill this gap by comparing existing GP methods as well as devising new ones. We evaluate our methods on several benchmark datasets based on predictive accuracy and on how well the original features can be reconstructed using the lower-dimensional representation only. Finally we qualitatively assess the resulting expressions and their complexity. We find that various GP methods can be competitive with state-of-the-art DR algorithms and that they have the potential to produce interpretable DR mappings

    Mini-Batching, Gradient-Clipping, first-versus second-order: What works in Gradient-Based coefficient optimisation for Symbolic Regression'

    Get PDF
    The aim of Symbolic Regression (SR) is to discover interpretable expressions that accurately describe data. The accuracy of an expression depends on both its structure and coefficients. To keep the structure simple enough to be interpretable, effective coefficient optimisation becomes key. Gradient-based optimisation is clearly effective at training neural networks in Deep Learning (DL), which can essentially be viewed as large, over-parameterised expressions: in this paper, we study how gradient-based optimisation techniques as often used in DL transfer to SR. In particular, we first assess what techniques work well across random SR expressions, independent of any specific SR algorithm. We find that mini-batching and gradient-clipping can be helpful (similar to DL), while second-order optimisers outperform first-order ones (different from DL). Next, we consider whether including gradient-based optimisation in Genetic Programming (GP), a classic SR algorithm, is beneficial. On five real-world datasets, in a generation-based comparison, we find that second-order optimisation outperforms coefficient mutation (or no optimisation). However, in time-based comparisons, performance gaps shrink substantially because the computational expensiveness of second-order optimisation causes GP to perform fewer generations. The interplay of computational costs between the optimisation of structure and coefficients is thus a critical aspect to consider

    Deep Generative Symbolic Regression with Monte-Carlo-Tree-Search

    Get PDF
    Symbolic regression (SR) is the problem of learning a symbolic expression from numerical data. Recently, deep neural models trained on procedurally-generated synthetic datasets showed competitive performance compared to more classical Genetic Programming (GP) algorithms. Unlike their GP counterparts, these neural approaches are trained to generate expressions from datasets given as context. This allows them to produce accurate expressions in a single forward pass at test time. However, they usually do not benefit from search abilities, which result in low performance compared to GP on out-of-distribution datasets. In this paper, we propose a novel method which provides the best of both worlds, based on a Monte-Carlo Tree Search procedure using a context-aware neural mutation model, which is initially pre-trained to learn promising mutations, and further refined from successful experiences in an online fashion. The approach demonstrates state-of-the-art performance on the well-known SRBench benchmark

    On the feasibility of automatically selecting similar patients in highly individualized radiotherapy dose reconstruction for historic data of pediatric cancer survivors

    Get PDF
    Purpose: The aim of this study is to establish the first step toward a novel and highly individualized three-dimensional (3D) dose distribution reconstruction method, based on CT scans and organ delineations of recently treated patients. Specifically, the feasibility of automatically selecting the CT scan of a recently treated childhood cancer patient who is similar to a given historically treated child who suffered from Wilms' tumor is assessed.Methods: A cohort of 37 recently treated children between 2- and 6-yr old are considered. Five potential notions of ground-truth similarity are proposed, each focusing on different anatomical aspects. These notions are automatically computed from CT scans of the abdomen and 3D organ delineations (liver, spleen, spinal cord, external body contour). The first is based on deformable image registration, the second on the Dice similarity coefficient, the third on the Hausdorff distance, the fourth on pairwise organ distances, and the last is computed by means of the overlap volume histogram. The relationship between typically available features of historically treated patients and the proposed ground-truth notions of similarity is studied by adopting state-of-the-art machine learning techniques, including random forest. Also, the feasibility of automatically selecting the most similar patient is assessed by comparing ground-truth rankings of similarity with predicted rankings.Results: Similarities (mainly) based on the external abdomen shape and on the pairwise organ distances are highly correlated (Pearson rp ≥ 0.70) and are successfully modeled with random forests based on historically recorded features (pseudo-R2 ≥ 0.69). In contrast, similarities based on the shape of internal organs cannot be modeled. For the similarities that random forest can reliably model, an estimation of feature relevance indicates that abdominal diameters and weight are the most important. Experiments on automatically selecting similar patients lead to coarse, yet quite robust results: the most similar patient is retrieved only 22% of the times, however, the error in worst-case scenarios is limited, with the fourth most similar patient being retrieved.Conclusions: Results demonstrate that automatically selecting similar patients is feasible when focusing on the shape of the external abdomen and on the position of internal organs. Moreover, whereas the common practice in phantom-based dose reconstruction is to select a representative phantom using age, height, and weight as discriminant factors for any treatment scenario, our analysis on abdominal tumor treatment for children shows that the most relevant features are weight and the anterior-posterior and left-right abdominal diameters
    corecore